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ABSTRACT  -  Martovian  models  are  developed  fior  the  perfiormance  analysis  of  mul¬ 
tiprocessor  systems  intercomminicating  via  a  set  of  busses.  The  performance 
index  is  the  average  nixnber  of  active  processors,  called  processing  power. 
Fran  processing  power  a  variety  of  other  performance  measures  can  be  derived 
as  dictated  by  the  specific  processor  application.  Exact  models  are  first  in¬ 
troduced,  and  are  illustrated  vdth  a  sinple  example.  The  ccnputational  com¬ 
plexity  of  the  exact  models  is  shown  to  increase  very  rapidly  with  system 
size,  thus  makirg  the  exact  aneilysis  impractical  even  for  mediun  size  systems. 
To  overcame  the  canplexity  of  canputation,  several  approximate  models  are  in¬ 
troduced.  The  approximate  results  are  conpared  with  the  exact  ones  and  found 
tc  be  surprisingly  accurate  for  a  wide  range  of  configurations.  Simulation  is 
used  to  validate  the  analytic  models  and  to  test  their  robustness. 
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1.  lOTRODUCnON 


Tightly  cxjnnected  multiprocessor  systans  are  characterized  by  the  pres¬ 
ence  of  several  processing  mits  and  one  or  more  cotnmon  manory  areas,  used 
the  processors  for  the  exchange  of  infiamation  and,  possibly,  the  storage  of 
ootiticn  code  and  data  structures  of  non  frequent  use.  Processors  and  cannon 
manories  eire  connected  by  seme  kind  of  canmnication  system,  usually  ceilled 
interconnection  network. 

Ebrly  multiprocessor  systems  were  developed  using  cros^ar  networks  to 
connect  processors  and  memories .  A  widely  known  crossbar  multiprocessor  sys¬ 
tem  is  C.nutp,  the  Carnegie  Mellai  multiminicanputer  [WULF72].  The  performance 
of  crossbar  multiprocessors  has  been  widely  ainalyzed  in  recent  years  [BHAN75, 
BASK76,  HXK377,  SBnr77,  WILL78]. 

With  the  avadlability  of  inexpensive  microprocessors,  multiprocessor 
systems  with  a  very  leurge  nvmher  of  canponents  are  now  beooning  feasible  and 
cost  effective.  Ebr  such  systems  a  crossbar  interconnection  network  may  be 
intolerably  expensive  and  in  general  it  would  provide  a  bandwidth  much  higher 
than  needed.  A  more  attractive  alternative  is  represented  by  bus-oriented  in¬ 
terconnection  networks.  Single  or  multiple  bus  architectures  can  be  used,  ac¬ 
cording  to  the  bandwidth  required  for  the  specific  application.  These  inter¬ 
connection  networks  are  generally  called  "multiple-bus"  or  "highway  deficient" 
[WII1,78]  networks.  Seme  papers  addressing  the  analysis  of  bus  systems  ap¬ 
peared  very  recently  in  the  literature  Chc»N77,  Fi)NG78,  WILL78]. 
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This  report  presents  exact  and  approximate  ^brtovian  models  fior  the 
analysis  of  miH  tipi  e-bus  multiprocessor  systems .  Section  2  describes  the 
basic  multiprocessor  system  investigated  in  this  stuiy.  In  section  3  the 
model  fior  performance  analysis  is  presented  and  the  assumptions  on  system 
operaticMis  cure  discussed.  Section  4  derives  a  variety  of  application-oriented 
performance  indices  .  Section  5  provides  an  exact  model  for  a  simple  cros^ar 
architecture.  Section  6  discusses  exact  models  for  general  multibus  curchitec- 
tures,  vhereas  section  7  derives  some  approximate,  but  conputationadly  very 
efficient  models.  In  section  8  stochastic  Petri  net  models  are  introduced. 
In  section  9  exact  and  approximate  analytic  results  are  compEured,  and  simula¬ 
tion  results  are  presented. 


2.  THE  MJLTIPIiE  PROCESSOR  SYSTEM 


This  stu3y  considers  multiple  processor  systems  that  exchange  informa¬ 
tion  through  a  cotimaa  memory  vhich  consists  of  several  modules.  Processors 
and  cannon  memory  modules  are  connected  by  a  set  of  "global  busses"  .  Ebch 
global  bus  can  ccnnect  any  processor  to  any  memory  module.  Every  parocessor  is 
also  connected  (and  has  exclusive  aocess)  to  a  private  memory.  The  block  di¬ 
agram  of  a  systan  with  3  processors,  3  memory  modules  and  2  busses  is  shown  in 
fig.  1. 

The  exchange  of  information  is  accomplished  by  first  writing  the  infor¬ 
mation  in  the  appropriate  common  manory  module  and  then  reading  it  from  the 
destination  processor.  Due  to  the  sharing  of  both  ma™ory  modules  and  busses, 
contention  may  arise,  causing  processors  to  queue  for  a  resource  which  is 
currently  in  use .  If  the  nunber  of  busses  b  is  greater  or  equal  to  the  siiall- 
er  between  the  nunber  of  processors  p  and  the  nunber  of  manories  m,  i.e. 
b  ^  min(m,p),  then  the  contention  is  only  caused  by  the  sharing  of  manory 
modules.  Therefore,  a  processor  can  always  find  a  free  bus  to  access  a  free 
ccmtncn  manory.  If,  on  the  other  hand,  the  inequality  is  not  satisfied,  a  pro¬ 
cessor  may  be  forced  to  wait  for  a  manoty  which  is  currently  free  because  no 
bus  is  available. 

Multiple  processor  systems  for  which  the  inequality  holds  are  usually 
known  as  "crosdDar"  architectures.  Nbte  that  in  general  it  is  not  wise  to  set 
b  >  min(m,p)  ml  ess  we  want  to  add  some  redundancy  in  the  interconnection  net¬ 
work  for  reliability  purposes.  In  fact,  the  availability  of  extra  busses  does 
not  affect  the  crossbaur  systan  model,  nor  does  it  improve  its  throi:gl’put . 
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Multiple  processor  systems  for  vihich  the  first  inequality  does  not  hold 
are  usually  called  "highway  deficient"  systems  or  "(multiple)  bus"  architec¬ 
tures  (where  the  word  multiple  is  dropped  in  the  case  of  b=l).  PDr  these  sys¬ 
tems  we  assume  throiqhout  this  report  that  p  ^  m  >  b.  The  case  m  >  p  can  be 
analyzed  using  the  same  techniques  described  here;  the  models  cire  generally 
simpler  than  those  presented  in  this  report. 

It  is  possible  to  construct  a  queueing  network  model  for  the  analysis  of 
both  types  of  systems.  The  general  case  is  shown  in  fig.  2.  Processors  join 
manory  queues,  and  before  proceeding  to  service  (i.e.  accessing  memory)  they 
must  be  granted  a  permit  (bus) .  The  permit  is  returned  ipon  conpletion  of 
service.  The  general  model  is  thus  a  closed  queueing  network  with  p  classes 
of  customers  and  with  passive  resources  CCH?^8,  KELL76b],  which  in  this  case 
represent  the  busses.  In  the  case  of  crossbar  architectures  the  presence  of 
busses  can  be  ignored,  thus  making  the  analysis  substantially  simpler  than  for 
multiple  bus  systems. 
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3.  THE  M»EL 


Models  of  multiprocessor  systems  are  developed  both  to  gain  a  deeper 
understanding  of  their  behavior  and  to  ctotain  a  set  of  performance  indices 
that  can  be  used  to  guide  the  design  of  actual  systems. 

A  model  cannot  include  all  the  details  of  the  system,  rather,  it  is  an 
abstraction  of  the  real  system  including  the  features  relevant  to  the 
analysis.  Different  models  are  generally  constructed,  depending  on  the  nature 
of  the  application  and  the  degree  of  detail  required  by  the  study.  In  our 
case,  the  central  feature  of  the  system  is  the  overall  processing  capability 
limitation  due  to  the  contention  for  marraries  and  busses.  Our  models  there¬ 
fore  will  focus  on  the  loss  of  processing  power  due  to  this  contention. 

In  general  v«  say  that  a  processor  can  be  in  one  of  three  different 
states; 

(1)  The  processor  can  execute  in  its  private  manory. 

(2)  The  processor  can  exchange  data  with  other  cooperating  processors, 
by  reading  fron,  or  writing  into  the  cotmon  manory  modules. 

(3)  The  processor  can  be  waiting  to  access  a  conmon  memory  module. 

We  say  that  a  processor  is  PCTVJE  when  it  is  in  state  ( 1 ) ,  and  the  goal 
of  our  analysis  is  to  determine  the  average  percentage  of  time  for  vhich  pro¬ 
cessors  are  active. 
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By  introducing  an  ergodic  assunption  we  can  say  that  the  above  quantity 
is  equal  to  the  average  nunber  of  active  processors  divided  by  the  total 
nunber  of  processors.  Such  quantity  is  usually  knonna  in  the  literature  as 
Processing  Efficiency.  As  the  nimber  of  processors  is  a  knowi  ccaistant  vie  can 
simply  evaluate  the  average  nunber  of  active  processors,  called  Processing 
Power  of  the  system  (p). 

P  =  E  [#  active  processors]  (1) 

P  is  the  main  performance  index  ccHisidered  in  the  sequel.  Other  impor¬ 
tant  performance  measures  are  simply  related  to  P,  as  shown  in  section  4. 

The  following  assumptions  are  made  regarding  the  operation  of  the  sys¬ 
tem: 


a)  Processors  perform  a  bacl^round  activity  that  only  requires  accesses 
to  the  processor' s  private  memory. 

b)  Fran  time  to  time  processors  exchange  information,  and  thus  access 
the  oatmon  manory,  performing  read/ write  operations. 

'  c)  The  duration  of  the  access  to  the  cannon  memory  is  an  independent, 
exponenticilly  distributed  randan  variable  wrLth  mean  l/pj  for  the  j-th  memory 
module. 

d)  When  a  processor  requires  access  to  a  cannon  manory  module,  a  path 
is  inmediately  established  (wdth  zero  delay)  between  the  processor  and  the 
referenced  memory  module,  provided  that  a  bus  is  availetole  and  the  manory  is 
not  being  accessed  by  another  processor. 
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e)  If  a  path  cannot  be  established  the  processor  idles,  waiting  for  the 
necessciry  resource(  s)  (This  may  not  be  true  for  multiprocessor  systems  using 
an  internet  mechanisn.  The  hypothesis  is  conservative  anyviay) . 

f)  Upon  memory  access  conpletion,  memory  and  bus  are  imnediately 
released  (with  zero  delay)  and  the  processor  resumes  its  backrgroind  activity. 
The  interval  between  subsequent  access  requests,  is  an  independent,  exponen¬ 
tially  distributed,  randan  variable  with  mean  1  /  Xj  for  j-th  processor. 

g)  An  access  request  fran  processor  i  is  directed  to  memory  j  with  pro¬ 
bability  p^j  .  Thus,  the  access  rate  from  processor  i  to  memory  j  is  defined 
as  .=Xp- •  . 

The  above  eissunptions  guarantee  that  a  l^rtovian  model  can  be  construct¬ 
ed.  Unfortmately,  this  does  not  guarantee  that  a  solution  (closed  form  or 
nunerical)  can  then  be  easily  obtained.  In  particular,  such  models  show  an 
explosion  of  the  nonber  of  states  when  the  nuntiber  of  system  canponents  is  in¬ 
creased.  The  analysis  becomes  rapidly  very  tedious  even  for  moderately  com¬ 
plex  systems. 

In  order  to  reduce  the  nunber  of  states  we  introduce  three  further  as- 
sunptions . 

h)  All  processors  are  assuned  to  have  equal  canmcxi  memory  access  rate, 
X,  and  all  memories  are  assuned  to  be  equal,  so  that  the  average  memory  access 
time  is  the  same  for  all  memories  and  all  processors  (l/u)  • 
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i)  A  unifom  reference  model  is  assimed;  this  implies  that  every  access 
request  from  every  processor  is  directed  to  any  maivory  with  equal  prbb^ility 
i/m,  vhere  m  is  the  nunber  of  conmon  memory  modules. 

1)  When  a  bus  goes  idle,  the  next  processor  to  use  the  bus  is  selected 
at  random  anong  the  heads  of  the  queues  referencing  manories  vthich  have  become 
free. 

In  formulae,  assumptions  h)  and  i)  state  that; 


*^1  “  '^2  ~ 


P 


ij 


I 

m 


all  i,j 


all  i,j 


(2) 


With  these  additional  assumptions  ve  succeed  in  performing  an  exact 
analysis  of  some  moderately  complex  systems,  but  still  cannot  attack  very 
large  problems. 


The  equal  processor  access  rate  assumption  in  h)  was  shown  to  be  a  cc*i- 
servative  one  in  the  single  bus  case  [AJMOSO]  and  is  expected  so  also  in  the 
more  general  case  of  multiple  busses. 

Processors  access  the  common  memory  modules  to  perform  either  read  or 
write  operations;  we  do  not  distinguish  between  the  two  operations  in  our 
models,  and  do  not  therefore  account  for  the  feet  that  a  processor  may  attempt 
to  read  data  vhich  is  not  present  in  comrioi  memory.  This  results  in  the  pro¬ 
cessor  going  idle,  with  consequent  througl^jut  reduction.  This  feature  can  be 
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inclaied  in  the  martovian  mcdel,  but  the  state  space  is  greatly  expanded.  A 

more  system  oriented  approach  can  be  pursued,  by  assuning  that  a  fraction  c(  of 

the  accesses  is  for  write  operations  and  a  fraction  (l-c()  is  for  read  opera- 

ticxis.  A  read  operation  finds  the  required  information  with  probability  q. 

Assoning  that  the  access  request  generation  process  is  not  altered  by  not 

finding  the  desired  information,  the  acttal  time  spent  in  useful  computation 

» 

is  decreased  by  a  fector  (l-q)(l-<i).  Thus,  the  actual  processing  power  qf  the 
system  is  simply  obtained  by  applying  the  above  fector  to  the  conputed  value 
of  P.  Obviously  in  this  case  it  is  necessaury  to  estimate  the  values  of  q  and 
4,  which  depend  on  many  system  parameters. 

Using  all  the  above  assvmptions  we  can  now  construct  a  ^terkov  chain  to 
model  the  behavior  of  the  system. 

The  state  of  the  ^fertov  chain  is  defined  by  the  2p-tv;ple 

Sy  32*  ....rap,  Sp)  (3) 

where: 

ra^  is  the  manory  referenced  by  processor  i 

is  the  state  of  processor  i 
m^  Ccui  take  values: 

0:  processor's  private  manory 
k:  k-th  catmon  memory  module 
can  take  values 

0:  active 

j :  queueing  ( j-th  in  queue)  for  module  m^ 

-1:  accessing  caTinoii  manory  module  m^ 


This  state  definition  hDwever  is  not  the  most  convenient  firon  the  canpu- 
tational  paint  of  view.  In  fact,  using  the  theory  of  "Ijjnpable"  Nbrkov  cheiins 
CKEME60],  we  may  lunp  equivalent  states  and  obtain  a  ^fertov  chain  of  substan¬ 
tially  smaller  size.  The  limping  technique  is  illustrated  by  am  exaniid.e  in 
section  5. 

The  state  definition  and  the  degree  of  lonpability  of  the  chain  depend 
CXI  the  policy  that  is  used  to  assign  a  free  bus  to  a  queueing  prcjcessor.  As- 
simption  1)  is  the  most  convenient  frcm  the  model  conpleKity  point  of  view, 
but  might  not  be  the  one  that  yields  the  best  performance.  Modifications  of 
cissimption  1)  will  be  briefly  discussed  in  the  secjuel . 
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4.  PERPORMVNCE  MEASURES 


The  processing  power  is  not  the  most  appropriate  perfiormance  index  for 
sane  applications-  Other  parameters  could  better  describe  the  quality  of  the 
system  in  some  cases.  Ftortmately,  however,  many  different  performance  in¬ 
dices  can  be  simply  derived  fron  the  processing  power. 

.  * 

Define  ,\  to  be  the  rate  at  viiich  customers  cycle  through  the  queueing 
network.  From  Little' s  result  we  have: 

=  P  ,\  (4) 


Applying  again  Little' s  result  to  the  entire  memory  systan  including 
queues  and  servers  we  find  the  average  customer  delay  D; 


(5) 


Finally,  subtracting  from  D  the  average  service  time  1/n  we  have  the  average 
queueing  time  W: 


\(*iere  p  = 


w  =  P  -  P . 


(6) 


The  average  nurtoer  of  queued  processors  is: 


=  W  P  ,\  =  p  -  P(Hp) 


(7) 


therefore  the  average  nunber  of  processors  accessing  ccmmai  memory  modules  is: 
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(8) 


Fran  the  values  of  average  cycle  time,  average  queueing  time,  average 
think  time  and  average  service  time  we  can  now  constrixrt  many  different  per¬ 
formance  indices,  depending  on  the  particular  application. 


If  the  processors  are  simply  updating  a  data  base,  a  reasonable  perfor¬ 
mance  measure  could  be  the  ratio  of  the  manory  access  time  to  the  sun  of  the 
access  time  plus  the  waiting  time.  Usirig  the  above  results,  this  performance 
index  is  expressed  as  follows: 


If,  on  the  other  hand,  our  multiprocessor  system  is  a  packet  switch 
operating  under  heavy  load  conditions,  vhere  input  processors  process  packets 
and  write  them  into  a  canmon  menory  and  oirtput  processors  read  them  and  again 
process  them  before  queueing  them  for  output,  then  the  "think"  time  represents 
the  time  necessary  to  process  an  incoming  (outgoing)  packet  and  the  service 
time  represents  the  time  necessary  to  write  (read)  a  packet  from  an  input 
(output)  processor  (note  that  the  exponential  read/write  time  corresponds  to 
exponential  packet  length  distribution) .  The  througliput  of  the  packet  switch 
can  then  be  expressed  as: 
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^ps  =  ^  ^ 


(11) 


Note  that  each  packet  must  be  processed  by  an  input  anS  an  output  processor; 
both  operations  require  one  cycle  time,  and  p  packets  can  be  processed  simul¬ 
taneously. 

Performance  indices  for  other  applications  can  be  constructed  in  a  simi¬ 
lar  vgay. 


5.  CBOSSBPiR  ARCHITECrURES 


Vfe  begin  by  presenting  as  an  exanple  the  sintplest  nov- trivial  case,  a 
2-processor,  2-inemory,  2-bus  (2x2x2)  system.  (Note:  the  even  sin^ler  case  of 
a  single  bus  structure  is  trivial,  and  can  be  analyzed  using  an  A^H/l  queue 
with  finite  population.  Extensions  of  the  single  bus  system  to  different  pro¬ 
cessor  access  rates  ard  general  service  distributions  eire  found  in  [AJMO0O]). 

A  px2x2  system  is  a  cros^ar  multiprocessor  and  can  thus  be  studied  as  a 
closed  queueing  network  with  p  classes  of  custcmers.  Due  bo  the  assunptions 
introduced  the  solution  can  be  obtained  by  application  of  the  product  form 
solution  CBASK75].  Vfe  shall  nevertheless  construct  a  ^ferkDV  chain  model,  as 
explained  before,  to  prcvide  a  first  simple  example. 

The  state  definition  is  in  the  case  of  a  2x2x2  system 


(nij^,  Sj^,  m2,  S2)  (12) 

and  the  ^brkov  chaiin  that  we  obtain  using  assunptions  a)  through  g)  is  showi 
in  fig.  3a.  In  this  case  no  lumping  is  possible.  However,  if  we  add  assunp>- 
tions  h)  through  1)  the  transition  rates  are  modified  as  shown  in  fig.  3b. 

Vfe  can  apply  the  lumping  technique  to  this  tferkov  chain  by  defining  mac¬ 
rostates  as  follows; 

(00)  =  C(OOOO)] 

(-10)  =  C (1-100), (2-100), (001-1), (002-1)] 

(-11)  =  C (1-111), (2-121), (111-1), (212-1)] 

(-1-1)  =  [(1-12-1), (2-11-1)] 

The  lunped  chain  is  shown  in  fig.  4. 
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steady  state  probabilities  fior  the  cheiin  in  fig.  4  are  now  very  easily 


evaluated,  yielding: 

P(-IO)  =  2p  P(00) 

P(-ll)  =  P{00) 

P{-1-1)  =  ^  p2  p(00)  (14) 

P(00)  =  [1  +  2p  +  I  p2]"^ 


Ttie  processing  power  P,  defined  as  the  average  nixnber  of  active  processors  is 
obtained  as: 

P  =  2  P(00)  +  P(-IO)  =  2(Hp)  Cl  +  2p  +  I  p^r^  (15) 

As  soon  as  we  increase  by  one  the  number  of  processors  we  realize  that 
the  genercil  description  is  not  practical.  Vfe  have  49  states  in  this  case, 
that  ws  can  lunp  to  6  macrostates  as  shown  in  fig.  5. 

The  processing  power  is  now  obtained  as: 

P  =  3  P(OOO)  +  2  P(-IOO)  +  P(-lOl)  +  P(-l-lO)  (16) 


In  the  same  manner  W5  get  the  limped  chain  in  the  case  of  four  proces¬ 
sors  that  is  shown  in  fig.  6. 

In  this  case  ve  see  that  in  the  limped  chain  we  have  two  states  with  two 
processors  accessing  the  cannon  memory  and  two  processors  in  queue.  State  a 
is  such  that  both  processors  queue  fbr  the  same  memory  and  state  b  is  such 
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Fig.  6  -  lAxnped  ^fartev  chain  £br  the  4x2x2  system. 
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In  the  case  of  a  px2x2  system  wa  are  not  interested  in  the  policy  followed  to 
choose  ths  next  processor  to  be  served  when  a  bus  becomes  available;  the  only 
thing  that  can  be  done  is  to  pick  one  of  the  processsors  queueing  for  the 
mamory  that  has  became  available  (This  is  true  in  general  for  any  crossbar  ar- 
chitecuore) .  The  fact  that  we  choose  the  first  in  the  queue  is  irrelevant  for 
the  evaliHtion  of  the  processing  povier. 

Vfe  can  now  draw  the  lunped  chain  in  the  general  case  of  a  px2x2  system 
(Fig.  7),  The  number  of  states  of  the  lunped  chain,  N,  can  be  evaluated  as: 


The  number  of  active  processors  associated  to  each  state  is: 
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p  -  n  -  n  -  n 
^  m  qi  q2 


and  thus  the  evaluation  of  the  processing  power  is  straighfbrward ,  once  the 
steady  state  prdb^ilities  associated  to  the  states  of  the  ^fertov  chain  are 
evaluated. 


In  the  case  of  a  3  manory,  3  bus  systan  the  state  of  the  lunped  tterkov 
chain  is  defined  as: 


(n  ,  n^  ,  n^  ,n^  )  ,  n^  >  n^  >  n^ 

fn  (^2  ^3  ^1  “  ^2  ~  ^3 


where 


n  ,  n  ,  n  are  defined  as  before 
™  %  ‘^2 

n  is  the  notiber  of  processors  queueing  for  the  third  conmon  memory 
currently  accessed. 

The  lunped  chain  that  we  obtain  is  now  shown  in  fig.  8.  The  transition 
rates  between  the  states  aure  not  shown,  but  can  be  easily  evaluated. 

In  the  general  case  of  p  processors,  ra  memories  and  m  busses  (p  ^  m)  the 
state  of  the  lunped  chaiin  is  defined  by  the  (mH)-tuple 


( n  n  , . . . ,  n  )  ,  n  >  n  >  . . .  >  n 


m,  q 


'll  -  '^2-  - 


and  the  definition  of  the  entries  is  a  straightforward  extension  of  the  previ¬ 


ous  case. 


The  structure  of  the  tertov  chadn  is  the  same  as  in  fig.  8  up  to  level 


3,  then  more  states  must  be  considered. 
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Vfe  can  express  in  the  general  case  the  transition  rates  between  two 
states,  provided  that  we  specify  more  precisely  the  state  of  the  ^farkDV  chain. 


Given  a  state  as  in  ( 21 ) ,  the  entries  n  must  be  arrcinged  in  decreasing 


order.  Vfe  will  then  have  seme  groqps  of  adjacent  entries  with  the  same  value. 


All  the  entries  of  the  state  can  at  most  increase  or  decrease  by  one 
unit  at  a  time.  Cnly  one  entry  can  change  at  a  time. 


Given  a  grovp  of  entries  n  ,  n  , n  ,  all  with  the  same  value, 

^+1  '^+j 

only  the  first  entry  of  the  grot?)  can  increase  by  one  unit,  and  only  the  last 
entry  can  decrease  by  one  unit.  In  this  manner  we  are  sure  to  preserve  the 
entries  in  decreasing  order. 


Consider  now  a  state 


(i*  91]_*  *^2  (22) 

this  state  can  evolve  into  at  most  2(nH-l)  other  states,  yfriich  are  identified 
by  the  following  transitions: 

i  — >  i+1 

i  — >  i-1  i>0 

(23) 

— >  qj^+1  qj^  first  entry  of  a  groi?) 
q^  — >  qj-1  qj  last  entry  of  a  group,  qj>0 

The  rates  associated  to  each  of  these  transitions  are: 
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R(i  ->  i+1)  = 


P 


i=0 


I  (p-n)  (m-i)  ^  0<i<m  ,  n<p 

1 

^  ^ 

R(i  ->  i-1)  =  (i-s)  *1  i-l^s 
R(qj  ->  q^-l)  =  1  ;j  j_^i 

v^iere: 


{24a) 

(24b) 

(24c) 

(24d) 


n  =  i  +  i  a, 
y=i  ^ 

1  =  #  of  entries  q2^.q2»  ••wq^  that  have  the  same 

value  of  qj  (including  qj  itself)  (25) 

s  =  #  of  nonzero  entries  qj^,q2, . . .  .q^^ 

The  nunber  of  active  processors,  associated  to  each  state  is  simply  p-n; 
it  is  thus  very  easy  to  obtain  the  processing  pytfsr,  once  we  have  solved  for 
the  steady  state  prdbctoility  distribution  of  the  ^ferl<DV  chain. 
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6.  MJLTIBUS  ARCHITEJCIURES:  EXACT  M3DELS 


Et)r  multiple  bus  eurchitectures,  the  cxjnplexity  of  the  Nfertov  chains  is 
much  larger  then  far  crossbar,  even  vbai  Ijtping  is  used.  Therefore  vie  can 
handle  only  moderately  canplex  systems  using  the  exact  state  description.  Fbr 
the  most  general  case  we  must  resort  to  appcoximate  models. 

The  state  definition  fior  the  exact  lunped  chain  in  the  case  of  a  multi¬ 
ple  bus  systan  is: 


(hfn,  qj_,  ^2  '  •••' 


(26) 


v*iere 


is  the  nunber  of  processors  currently  accessing  a  ccmmon  memory 

, . . . ,  are  the  nunbers  of  processors  queueing  for  the  memories 
currently  accessed,  arranged  in  decreasing  order 

q^^l , are  the  nunbers  of  processors  queueing  for  a  free  manory, 
not  accessible  because  no  bus  is  avail ^le,  arranged  in  decreasing  ord¬ 
er. 


Some  examples  of  lunped  ^brkov  chains  are  given  in  figs.  9  through  13, 
for  3x3x2,  4x3x2,  5x3x2,  4x4x2  and  4x4x3  systems,  respectively. 

Note  that  an  increase  in  the  nunber  of  processors  and/or  memories  com¬ 
plicates  the  ^terkov  chain,  v^iereas  an  increase  in  the  nunber  of  busses  tends 
to  sijtplify  the  ^brtovian  representation.  This  is  due  to  the  fact  that  the 
presence  of  a  higher  nunber  of  busses  mates  the  systan  more  similar  to  a 
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crossbar,  and  thus  reduces  the  nutiber  of  possible  queueing  situations. 

When  the  nunber  of  busses  is  just  one  less  than  the  nunber  of  proces¬ 
sors,  the  policy  for  the  choice  of  the  next  processor  to  be  served  is  ir¬ 
relevant.  In  the  other  cases  the  ^ter]<DV  chain  depends  on  such  policy.  Ocn- 
sider  for  instance  a  4x3x2  system  where  the  next  processor  served  is  the  one 
that  has  been  waiting  longest.  In  this  case  the  Martov  chain  is  the  one  shown 
in  fig.  14,  where  an  asterisk  is  added  to  indicate  which  customer  has  priori¬ 
ty.  In  general,  modifications  of  assumption  1)  require  that  more  information 
about  the  state  of  the  system  queues  is  recorded  in  the  ^tertov  chain  state 
description.  The  resulting  chains  may  thus  be  much  more  complex  than  those 
obtained  using  assvmption  1) . 

The  genercil  pxmxb  case  is  not  easy  to  handle,  even  after  lunping  is  ap¬ 
plied.  Vfe  will  therefore  introduce  in  the  next  section  seme  approximations 
vihich  further  reduce  the  si2e  of  the  Martov  chain  and  permit  us  to  attack  the 
most  general  case. 
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3 


Elg.  9  -  Lunped  ^ferte>v  chadn  for  the  3x3x2  system. 


Pig.  10  -  lAxnped  ^brtov  chedn  for  the  4x3x2  system. 
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Fig.  12  -  Lunped  ^turkov  cheun  fibr  the  4x4x2  system. 
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using  a  PCFS  disci^^ine. 


7 .  MJLTIBUS  ARCHITECnjRES;  APPRaXINBVrE  hCDELS 

The  reason  for  the  introduction  of  approocimate  ^fer]<Dvian  models  is  that, 
for  genercil  multibus  systems,  the  nunber  of  states  increases  very  rapidly  with 
system  size .  The  explosive  growth  is  doe  to  the  detailed  information  that  the 
states  must  record  about  the  queues  inside  the  system.  In  particular  for  each 
state  of  the  ^brl<ov  chain  the  number  of  custcmers  queued  for  all  canmon  memory 
modules  must  be  recorded.  That  is,  we  not  only  need  to  know  the  number  of  the 
queued  customers,  but  also  must  be  concerned  with  all  the  possible  ways  of 
distributing  these  customers  among  the  system  queues.  If  we  reduce  the  amount 
of  information  about  the  status  of  the  queues  we  have  no  Icxiger  a  first  order 
bferkov  chain  behavior  in  the  evolution  of  the  system  through  the  state  space. 
The  c^roximate  ^ferkov  models  that  we  introduce  in  this  section  analyze  the 
system  behavior  ‘  by  assuming  that  the  transitions  between  the  states  with  re¬ 
duced  queueing  information  still  satisfy  the  >ferkDV  property.  The  results 
that  ve  will  obtain  in  this  way  are  approximate  and  must  then  be  compared  to 
the  exact  ones  to  test  their  accuiracy. 

In  order  to  define  a  simplified  model,  one  needs  to  specify; 

a)  the  state  definition,  that  is  the  amount  of  information  used  to 
describe  the  state  of  the  fferkov  chciin.  As  was  mentioned  before  we  will  vise 
reduced  information  about  the  queues  in  the  system. 

b)  the  method  to  calculate  the  transition  rates  for  the  simplified  ^fer- 
kov  model .  As  the  behavior  is  approximated  by  the  simplified  ^brkov  chain  the 
transition  rates  must  be  evaluated  according  to  some  empirical  rule,  and 
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several  different  rules  can  be  envisioned. 

Tbree  different  state  definitions  (named  A,  B  and  C)  and  two  heuristic 
methods  for  the  evaluation  of  the  transition  rates  (named  1  and  2)  were  con¬ 
sidered.  The  approximate  models  are  named  using  the  letter  referring  to  the 
state  description  and  the  nimber  referring  to  the  evaluation  of  the  transition 
rates . 


Let  us  first  begin  with  a  very  simple  model: 


Nbdel  A1  -  The  state  of  the  system  is  simply  represented  by  the  to¬ 
tal  nutiber  of  processors  waiting  either  for  a  busy  matvory  or  for  a 
busy  bus,  and  by  the  nimber  of  processors  currently  accessing  a  con- 
man  memory  module.  Vfe  thus  have  a  pair 


vhere 

n^  =  #  of  processors  in  service 
n^  =  #  of  processors  queued 


(27) 


The  transition  rates  are  evalmted  by  assuming  that  each  active  pro¬ 
cessor  can  request  any  memory  module  with  the  same  prdbability  (uni¬ 
form  reference  model) .  Furthermore,  each  queued  processor  is  as- 
suned  to  request,  with  uniform  probability,  any  of  the  coninon  manory 
modules  currently  not  accessible  (this  approximation  implies  that  a 
queued  processor  can  randonly  reselect  a  new  manory  when  a  manory  or 
bus  beccmes  unblocked) . 
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If  we  apply  this  approximation  to  the  2x2x2  system  and  to  the  3x2x2  sys¬ 
tem  we  find  again  the  exact  (limped)  chains.  In  other  words,  the  dxve  as- 
simptions  are  automatically  verified  in  such  anall  systems,  and  therefore  no 
approximation  is  introduced. 

Oansider  now  a  4x2x2  system;  in  this  case  we  have  two  states  in  vhich 
two  processors  are  queued.  Our  approximate  chain  will  ccxisider  these  two 
states  as  a  single  one.  Note,  however,  that  the  merging  violates  the  condi¬ 
tions  for  lunping.  Sane  error  will,  therefore,  appear  in  the  results  due  to 
such  "prohibited"  lunping.  The  chain  that  we  get  is  shovn  in  fig.  15. 

This  approximation  can  be  extended  very  easily  to  the  px2x2  system,  cuid 
the  resulting  chain  is  shown  in  fig.  16.  The  nurtaer  of  states  N  is  in  this 
case  only  twice  the  nunber  of  processors. 

It)  illustrate  the  rate  computation,  consider  states  (2,p-2)  and  (l,p-2). 
The  rate  from  (2,p-2)  to  (l,p-2)  is  evaluated  multiplying  the  rate  out  of 
state  (2,p-2),  vAiich  is  2m,  by  the  pcob^ility  that  ncxie  of  the  p-2  queueing 
processors  is  referencing  the  menory  that  becomes  free.  Such  prcb^ility  is 

Carrying  out  the  analysis  for  the  most  general  case,  we  find  that  the 
pxmxb  system  is  represented  by  a  ^brtov  chain  with  b  vertical  chains  ( see  fig . 
17)  and  a  total  nunber  of  states  N,  vhere; 

N  =  l+  bCp  +  i  (1-b)]  (28) 
A  simple  upper  bound  onNisN^pb+1. 
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b  VERTICAL  CHAINS 


Fig.  17  -  Chain  of  the  parocb  systan  with  the  approKimate 


The  transition  rates  Ccin  be  explicitly  written  fiar  the  most  general 
case.  Their  derivation  is  reported  in  appendix  1.  Since  the  nonber  of  active 
processors  is  p-i-j.  the  processing  power  can  be  simply  evaluated  once  the 
steady  state  distribution  of  the  Nbrkov  chain  is  l<nown. 

!!fext  we  introduce  a  modification  of  model  Al,  by  specifying  a  different 
method  far  the  calculation  of  the  transition  rates: 

Model  A2  -  The  state  of  the  system  is  defined  as  in  model  Al ) .  The 
transition  rates  au:e  evaluated  using  an  "averaging"  technique. 

Vfe  describe  the  model  A2  using  a  3x3x2  system  as  an  example. 

The  exact  lunped  chain  for  the  3x3x2  system  is  shovn  in  fig.  9.  Using 
our  approximation,  the  states  (2100)  and  (2001)  are  merged  into  state  (2,1), 
even  if  this  violates  the  limping  conditions.  In  the  approximate  chain  all 
the  transition  rates  are  inchanged,  except  for  those  in  and  out  of  state 
(2,1).  Namely,  *-he  rates  into  state  (2,1)  are  obtained  by  adding  the  rates 
into  the  tvo  merged  states.  The  rates  out  of  state  (2,1)  are  obtained  by  not¬ 
ing  that  the  total  rate  out  must  be  2p,  and  that  the  rate  out  of  the  two 
merged  states  is  p  towards  state  (1,1)  and  p+-2p  towards  state  (2,0).  Vfe  thus 
average  the  rate  out  of  state  (2,1),  keeping  the  same  ratio.  The  resulting 
chain  is  shown  in  fig.  18. 

Note  that  no  error  is  made  in  the  approximation  if  the  merged  states 
have  equal  steady  state  probability.  Otherwise  the  resulting  chain  only  ap¬ 
proximates  the  exact  one. 
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The  pc2x2  system,  using  this  approx imation,  is  represented  by  the  chain 
of  fig.  19. 

The  mare  general  case  of  p  processors,  m-manories,  2  busses  can  still  be 
handled,  provided  that  we  solve  the  cotibinatorial  problan  of  counting  the 
nunber  of  states  at  each  level  of  the  exact  limped  chain.  The  level  of  the 
state  is  defined  as  the  sun  of  the  nunber  of  processors  accessing  oanmon 
manory  and  the  nunber  of  queued  processors.  There  is  only  one  state  at  levels 
0  and  1,  and  there  are  two  states  at  level  2.  Por  levels  larger  than  two  we 
have  one  state  with  n^=l  and  n(m,2,K)  states  with  nj^=2.  The  expression  of 
n(m, 2,k)  is  derived  in  appendix  2.  The  approximate  chain  in  the  case  of  a 
pxmx2  system  is  shown  in  fig.  20. 

The  extension  to  the  general  pxmxb  system  with  an  arbitrary  nunber  of 
busses,  requires  the  counting  of  the  states  at  each  level  of  a  more  conplex 
Markov  chain,  eind  the  corresponding  evaluation  of  new  transition  rates. 

I 

We  now  consider  another  definition  of  system  state  (yet  retaining  the 
rate  amputation  rule  of  model  A2): 

Model  B2  -  The  state  of  the  system  is  represented  by  the  following 
triplet :  ( 1 )  the  nunber  of  processors  accessing  a  canmon  manory 
nxdule;  (2)  the  total  nunber  of  processors  waiting  either  for  a  busy 
manory  or  for  a  busy  bus;  and  (3)  a  flag  which  is  set  to  zero  when 
no  processor  is  queued  for  a  bus,  and  is  set  to  one  when  one  or  more 
processors  are  queued  for  a  bus  in  order  to  access  a  free  ccrnmnon 
manory  module. 
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Namely,  the  state  is  defined  by  the  triplet 


(V  ^q'  (29) 

vhere 

=  #  of  processors  accessing  conmon  memory 
n^  =  #  of  queueing  processors 
f  flag:  0  no  queue  for  a  bus 

1  one  or  more  processors  are  queued  for  a  bus 

The  transition  rates  are  evaluated  using  the  averaging  technique 
described  in  the  approximation  A2. 

Clearly,  model  B2  is  a  refinement  of  A2,  since  the  state  is  improved  by 
adding  a  binary  information  concerning  the  systan  queues. 

Vfe  immediately  reccgnize  that  for  cros^ar  architectures  the  B2  approxi¬ 
mation  is  the  same  as  the  A2  approximation,  since  the  flag  is  alv«ys  zero  (no 
vait  for  a  bus) . 

Consider  now  a  4x3x2  system:  the  approximate  chain  is  shovn  in  fig.  21. 
If  we  conpare  th’ ~  chain  with  the  exact  lumped  chain  of  fig.  10,  we  see  that 
four  states  have  been  merged  into  two,  violating  the  lunpability  conditions. 
The  new  transition  rates  cure  computed  using  the  averaging  technique.  The  ap¬ 
proximate  ^fertov  chain  for  a  5x3x2  system  is  shown  in  fig.  22. 

The  general  pxrnx2  case  can  be  managed  by  using  the  combinatoricd.  results 
of  appendix  2.  The  resulting  chain  is  shown  in  fig.  23.  The  total  number  of 
states  is  in  this  case 
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Pig.  22  -  Chedn  of  the  5x3x2  system  with  the  appraxiinate  model  B2. 
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23  -  Chzdn  of  tte  p*ita2  systan  vdth  the  aFproKimate  TKxiel  B2 


N  =  3(p-l)+l 


(30) 


As  an  exanple  the  pk3x2  chain  is  shov^n  in  fig.  24.  In  this  particular 
case  the  canbinatorial  results  can  be  put  in  polynomial  fbm  (see  appendix  2). 

The  nunber  of  active  processors  associated  to  each  state  is  p>"njjj-nq, 
thus  the  processing  power  can  be  easily  computed,  once  the  steady  state  pro¬ 
bability  distribution  of  the  chain  is  tocxun. 

All  the  preceding  approximate  models  lack  of  one  feature  \^ich  is  very 
desirable  in  all  analytic  models:  namely,  a  closed  form  solution.  We  intro¬ 
duce  here  the  simplest  possible  model,  vhich  provides  us  with  a  closed  form 
solution. 

Model  C2  -  The  system  state  is  simply  the  number  of  active  proces¬ 
sors:  no  account  is  kept  of  the  state  of  internal  queues.  The  tran¬ 
sition  rates  cire  evaluated  using  the  averaging  technique. 


Ihe  transition  diagram  in  the  case  of  a  pxmx2  system  is  shovm  in  fig. 
25.  We  have  reduced  the  system  description  to  a  birth  and  death  ^brk^v  chain, 
v>*Tose  solution  is  easily  obtained:  derate  by  w(i)  the  steady  state  prdbaibility 
of  state  i,  then 

(31) 


n(p) 


j=0l'*'‘  kO 


with 


Chadn  of  the  pK3x2  systan  with  the  aFproocimate  model  B2 


vghere  n(m, 2,  i)  is  defined  in  appendix  2.  The  prccessing  power  can  then  be  ex¬ 
pressed  as: 


jiilp-i  pi  ^y-1 

p  Ul  ( i-1 )  1  ^  ^ 

j=0  l'^'  1c=0  ^  I 


The  general  pxmxb  case  can  also  be  solved.  The  resulting  Nbrtov  chain 
is  shovn  in  fig.  26.  The  steady  state  prcfcabilities  are  in  this  case: 


"(p)  -  1  1  +  'I''  I  jijp-’  e|-  p;;i|  I 

I  j=0  1  ^  Tc=l  ^  I  I 


v^ere: 


b-1  i-b 

I  j  p.:(i)  +  b  2  I\,(  jtb)  Pn^b(i-2b-j+m) 

B.  ^  3=1  _ 3=0  ' _ 

^i  b-1  i-b  . 

Pj(i)  +  ^  j  Pb(i^b)  p^^(i-2b-j+in)  j 


,  i>l 


3nd  p^(j)  is  defined  in  appendix  2. 


'Ihe  expression  of  the  processing  power  P  is  then  as  follows: 


P 

P  =  2 


l,\lp-i  pi  _-l 

ijil  TI^  J,  Pk 


8.  STOCHASTIC  PETRI  NET  MDDELS 
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Petri  net  mcdels  and  derivations  thereof  CPETR66,  H0LT68,  H0LT70, 

PETk73]  have  been  introduced  by  several  authors  for  the  modeling  of  computer 
systems  [N0E71,  NUTT72,  NDE73,  KELL76b,  PETE77,  Affl:R79,  SHAP79].  Althaijgh  in 
standard  Petri  nets  no  measure  of  time  is  considered  (only  a  partial  ordering 
of  the  occurrences  of  events  is  established),  sane  of  the  models  presented  in 
the  literature  allow  a  measure  of  the  flew  of  time  by  introducing  the  concept 
of  transition  times.  Transition  times  are  assumed  to  be  deterministic,  even 
in  the  tendon  Petri  net  models  introduced  by  Shapiro  [SHAP79].  Malloy 
CMCLL80]  first  introduced  the  idea  of  random  transiton  times,  by  allowing  them 
to  be  exponenticilly  distributed  randan  variables.  Vfe  show  in  this  section  how 
such  models  can  be  used  to  describe  the  behavior  of  multiple  bus  multiproces¬ 
sor  systems  and  to  obtain  the  l^rkovian  models  discussed  in  the  previous  sec¬ 
tions. 

Ebr  an  introduction  to  Petri  nets  the  reader  is  referred  to  the  tutoried 
papers  by  Peterson  and  Agerwala  CPETE77,  AGER79]. 

Fbllcwing  [Ad;R79]  \ne  define  a  Petri  net  (PN)  to  be  represented  by  a 
bipartite,  directed  graph:  PN  =  (T,P,A),  vhere: 

T  =  {t^,t2, . . . ,t^}  is  a  set  of  transitions 

P  =  lPj,P2,  •••»?„,!  is  s  set  of  places 

{TxP}u{PxT}  is  a  set  of  directed  arcs  (37) 

The  set  {  T  U  P  }  forms  the  set  of  nodes  of  the  Petri  net. 
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The  cJynamic  properties  of  the  can  be  studied  by  analyzing  the  move¬ 
ments  of  tokens  inside  the  net.  A  Wl  with  tokens  is  a  marked  Petri  net  MTO  = 
(T,P,  A,M)  •  A  marking  M  of  a  i?I  assigns  tokens  to  places;  M  can  be  viev«d  as  a 
vector  yhose  i-th  component  represents  the  nunber  of  tokens  assigned  to  the 
i-th  place  p^,  A  marking  can  also  be  viewed  as  a  mapping  fron  the  set  of 
places  P  to  the  natural  noribers  I: 

M  :  P  ->  I 

M  =  {(/^,(/^ . (4}  (38) 

It  is  carman  practice  to  represent  places  by  circles,  transitions  by 
bars  and  tokens  by  black  dots.  A  simple  Petri  net  is  shov^n  in  fig.  27. 

Ebr  a  given  transition  t  we  define  the  set  of  input  places  l(t)  as: 

I(t)  =  {  p  I  (p,t)  A  1  (39) 

in  a  similar  manner  the  set  of  output  places  is  defined  as: 

0(t)  =  {  p  I  (t,p)  •<  A  }  (40) 

A  transition  is  enabled  if  the  marking  M  of  the  Petri  net  is  such  that: 

M(p)  >0  all  p  I(t)  (41) 

Enabled  transitions  can  fire  thus  removing  one  token  fron  each  input  place  and 
putting  one  token  in  each  output  place.  The  firing  of  a  transition  alters  the 
marking  of  the  EN  and  may  then  enable  other  transitions.  The  dynamic  behavior 
of  the  Petri  net  can  thus  be  investigated  studying  the  sequences  of  markings 
produced  by  firing  the  transitions. 

Standcird  Petri  net  models  do  not  consider  time  as  a  parameter  of  the 
net;  the  firing  of  a  transition  is  assumed  to  be  instantaneous.  Modified 
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models  (see  for  instance  the  E-net  models  [NUrr72,  NDE73])  allow  the  introduc¬ 
tion  of  fixed  transition  times.  With  stochastic  Petri  nets  the  transition 
times  are  assuned  to  be  exponentially  distributed  randan  verifies  (possibly 
with  zero  mean,  thus  accounting  for  inmediate  transitions) .  More  precisely, 
the  time  that  elapses  betveen  the  enabling  and  the  firing  of  a  transition  is 
an  exponeiiticilly  distributed  randan  variable;  the  firing  time  is  still  assuned 
to  be  zero,  thus  in  the  case  of  two  ccxifLicting  transitions  the  firing  of  one 
disables  the  other. 

A  ccxitinuous  time  stochastic  Petri  net  (SHJ)  is  thus  an  extension  of  the 
standard  Petri  net; 


SPN  =  (P,T,A,M,6)  (42) 

vhere  6  is  the  set  of  the  transition  rates  associated  to  each  transition; 

6=  {6j^,62,  ••  (43) 

A  discrete  time  SIU  can  also  be  introduced,  by  considering  geonetrically 
distributed  transition  times  CM(XL80]. 

Petri  nets  are  useful  in  modeling  asynchronous  concurrent  activities  in 
real  systems.  Vfe  can  attach  a  physical  interpretation  to  markings  and  transi¬ 
tions;  a  marking  can  represent  the  state  of  the  system  and  a  transition  can 
represent  an  event  which  modifies  the  system  state.  Ocnsider  for  example  the 
very  simple  system  of  fig.  28;  two  processors  access  an  extemed.  common 
memory.  The  behavior  of  this  system  can  be  represented  by  the  HI  of  fig.  27, 
by  giving  the  following  interpretation  to  places  and  transitions; 
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Fig.  28  -  TNiJO-Frocessor  system 


processor  1  active 

p^  processor  1  accessing  oanmcn  manory 
p^  bus  available 
p^  processor  2  active 
Pg  processor  2  accessing  cannon  tnanory 
processor  1  seizes  the  bus 
t2  processor  1  releases  the  bus 
t^  processor  2  releases  the  bus 
t^  processor  2  seizes  the  bus 

With  this  model  vie  represent  the  possible  CCTiflicts  in  access  requests,  but  do 
not  explicitly  model  the  queueing  of  a  processor  in  order  to  access  the  cannon 
manory.  This  feature  can  be  obtained  by  adding  tvo  places  and  two  transitions 
to  the  net  as  shown  in  fig.  29.  The  interpretation  of  the  added  nodes  is: 

Pg  processor  1  queued 
p^  processor  2  queued 
tg  processor  1  issues  a  request 
t  processor  2  issues  a  request 

The  marking  shown  in  the  figures  indicates  the  initial  state  of  the  systan. 
In  order  to  obtain  the  full  definition  of  the  stochastic  Petri  net  we  must  as¬ 
sociate  a  rate  with  each  transition.  Using  the  same  notation  as  in  section  3 
we  have: 

6,  =  6-  =  oo  inmediate  transition 
1  4 

62  =  62=*!  msTKiry  access  conpletion  rate 
6g  =  access  rate  of  processor  1 

6g  =  access  rate  of  processor  2  (44) 
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'Ihe  analysis  of  Petri  net  models  is  usually  based  on  the  prc^rties  of 
the  reachability  set  associated  to  the  The  Rechability  set  of  a  is  the 
set  of  all  marlcings  reachable  fran  the  initial  marking  M.  A  marking  M'  is  im¬ 
mediately  reachable  fran  M  if  it  can  be  obtained  fran  M  by  firing  sane  envied 
transition.  A  marking  M'  is  reachable  from  M  if  it  is  iimediately  reachable 
fran  M  or  if  it  is  reachable  fran  any  markirg  imnediately  reachable  fran  M. 
The  reachability  set  of  the  SM  of  fig.  29  is  easily  obtadned,  and  it  is  shown 
in  fig.  30.  Nfeirking  8  is  somewhat  different  firon  all  the  others,  as  it  is  ob¬ 
tained  from  markings  2  aind  by  firing  a  finite  rate  transition  before  an  im¬ 
mediate  transition.  Marking  8  is  therefore  reach^le  with  prcib^ility  zero. 

The  nunber  of  tokens  in  any  place  can  be  at  most  one  for  all  markings. 
This  means  that  the  SFN  is  safe.  A  place  in  a  is  said  to  be  safe,  if  it 
contains  at  most  one  token;  if  all  places  of  a  RJ  are  safe,  then  the  RI  is 
safe.  We  also  note  that  all  transitions  are  such  that  for  each  marking  M, 
there  is  a  markirg  M'  ,  reachable  fran  M  in  which  the  transition  is  envied . 
This  means  that  eill  the  transitions  in  the  net  are  live,  hence  the  RJ  itself 
is  live.  Liveness  is  an  important  property  as  it  guarantees  that  the  RJ  is 
dead lock- free . 

IXie  to  the  memoryless  property  of  the  negative  exponential  distribution, 
the  SRI  is  isomorphic  to  a  continuous  time  ^brkov  chain  as  shown  by  Moiloy 
[M0Ui80].  The  state  space  of  the  ^brkDV  chain  can  be  obtained  from  the 
reachability  set  by  eliminating  those  markings  that  enable  an  inmediate  tran¬ 
sition  (6^  =  oo)  .  In  the  case  of  the  SRI  of  fig.  29  we  must  eliminate  mark¬ 
ings  2  and  4  that  enable  t^^  and  t^,  respectively,  and  marking  8  that  enables 
both.  We  thus  have  a  5-state  Nbrkov  chain  that  can  be  represented  with  the 
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Fig.  30  -  Reachability  set  of  the  Stochcistic  Petri  net  of  fig.  29 
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transition  diagram  of  fig.  31,  v*iere  the  state  definition  is  as  follows: 


(Sj^,S2)  (45) 

with: 

=  state  of  processor  i 
w  =  active 

a  =  accessing  ocmtion  memory 
q  =  queuad 

The  marking  that  corresponds  to  the  state  is  also  indicated  in  the  figure. 
The  transition  rates  are  those  associated  with  the  transition  that  has  to  be 
fired  in  order  to  go  fron  one  state  to  the  other.  In  the  case  of  imnediate 
transitions,  we  consider  the  state  where  the  iimediate  transition  is  enabled 
to  coincide  with  the  state  resulting  fron  the  firing  of  the  iimediate  transi¬ 
tion. 

Oonsider  a  2x2x2  system,  as  described  in  section  5.  We  can  represent 
the  behavior  of  such  systen  using  the  SRI  of  fig.  32.  The  interpretation  of 
places  and  transitions  is  a  simple  extension  from  fig.  29.  Ihe  transition 
rates  are: 

^2  ^  ^^12 

63  =  64  =  65  =  6j^0  =  oo 
65  =  6^1  = 

^6  =  ^2  =  ^ 

^  =  »\21 

6q  =  ,\22 


1 


Fig.  32  -  Stochastic  Petri  net  tnodel  of  a  2x2x2  systan. 
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The  reachability  set  is  now  shown  in  fig.  33;  23  markings  are  possible, 
4  are  reach^le  with  prcb^ility  zero,  8  of  them  en^le  immediate  transitions, 
hence  the  associated  ^tlrkov  chain  has  eleven  states.  The  construction  of  the 
^ferkDV  chain  using  the  rates  associated  to  each  transition  yields  exactly  the 
ctain  of  fig.  3a.  Fran  the  SM  description  of  the  system  we  can  obtain  the 
Nbrkov  chain  description  presented  in  the  previous  sections. 

Ifete  that  the  stochcistic  Petri  net  of  fig.  32  is  safe  and  live,  hence 
the  system  (as  modeled)  is  deadlock- free. 

Petri  nets  have  been  used  to  describe  and  model  the  synchronization  of 
events.  In  the  case  of  multiprocessor  systems  that  exchange  messages  throLgh 
caiman  manories,  processors  are  synchronized  in  the  sense  that  a  message  can 
be  read  only  after  it  has  been  written.  As  we  mentioned  in  section  3  a  pro¬ 
cessor  may  look  for  a  message  in  a  catimon  memory  and  not  find  it.  Moreover, 
the  cannon  memory  area  is  limited,  it  can  accanodate  canly  a  fixed  nunber  of 
messages  (assume  that  the  common  memory  consists  of  several  buffers  v>hich  can 
accomodate  one  message  each) .  These  features  of  the  real  system  can  be  in¬ 
cluded  in  the  SIN  model  rather  easily.  Consider  again  the  simple  system  of 
fig.  28.  The  message  exchange  throijgh  finite  size  memories  can  be  modeled  ex¬ 
plicitly  using  the  SHJ  of  fig.  34,  vhere  the  interpretation  of  places  is  as 
follows ; 
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Marking  P2  P3  P4  ^5  ^6  ^7  ^8  ^9  ^10  ^11  ^12 


1  1  0  0  0  0 

2  0  1  0  0  0 

3  0  0  10  0 

4  1  0  0  0  0 

5  1  0  0  0  0 

6  0  0  0  1  0 

7  0  10  0  0 

8  0  1  0  0  0 

9  0  0  0  0  1 

10  0  0  1  0  0 

11  0  0  10  0 

12  1  0  0  0  0 

13  1  0  0  0  0 

14  0  0  0  1  0 

15  0  0  0  1  0 

16  0  1  0  0  0 

17  0  1  0  0  0 

18  0  0  0  0  1 

19  0  0  0  0  1 

20  0  0  1  0  0 

21  0  0  10  0 

22  0  0  0  1  0 

23  0  0  0  0  1 


1110  0  0 

1110  0  0 
1110  0  0 
110  10  0 

110  0  10 

0  110  0  0 

110  10  0 
110  0  10 
10  10  0  0 
110  10  0 
110  0  10 
0  1  0  0  0  1 
1  0  0  0  0  0 
0  10  10  0 
0  10  0  10 
0  1  0  0  0  1 
1  0  0  0  0  0 
10  0  10  0 
1  0  0  0  1  0 
0  1  0  0  0  i 

1  0  0  0  0  0 
0  0  0  0  0  0 
0  0  0  0  0  1 


0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

1 

0 

0 

0 

1 

0 

0 

0 

1 

1 

0 


Pig.  33  -  Reachability  set  of  the  Stochastic  Petri  net  of  fig.  32 
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Fig.  34  -  StxDchastic  Petri  net  mcdel  of  the  tvo-processor 
systan  inclvding  synchronization  and  buffer  size. 


Pi (11)  processor  1(2)  active 

^2(12)  ^^2)  queued  for  write 

P3(13)  ^-(2)  queued  for  read 

^4(14)  1^2)  testing  the  aveiil^ility  of  buffers 

^5(15)  1*’°®®®®°'^  1(2)  testing  the  presence  of  messages 
^6(16)  P*^°®®®®°^  ^(2)  writing 
pi^(l^)  processor  1(2)  reading 
Pa(g)  messages  for  processor  1(2) 

PlQ  bus  available 
p  Q  buffers  in  ccnmon  marrary 

J.O 

Tbe  interpretation  of  the  transitions  is: 


hai) 

proc.  1(2) 

issues  a  write  request 

^2(12) 

proc.  1(2) 

issues  a  read  request 

^3(13) 

proc.  1(2) 

seizes  the  bus  for  write 

^^4(14) 

proc.  1(2) 

seizes  the  bus  for  read 

^(15) 

proc.  1(2) 

fovnd  no  message 

^6(16) 

proc.  1(2) 

fouid  no  buffer 

^(17) 

proc.  1(2) 

fouid  a  message 

^(18) 

proc.  1(2) 

fovnd  a  buffer 

^(19) 

proc.  1(2) 

write  ends 

ho  (20) 

proc.  1(2) 

read  ends 

The  inmediate  transitions  in  the  SPN  are: 


^3'^4'^'^'^13'^14'^17'^18 

To  all  other  transitions  we  can  assign  finite  rates,  according  to  the  defini¬ 
tions  of  section  3.  The  SPN  is  live,  thus  the  systan  is  dead  lock- free,  but. 


in  general,  it  is  not  safe,  as  places  pg,  pg  and  p^^g  contain  more  thcin  one  to¬ 
ken  at  a  time,  unless  the  comon  manory  consists  of  a  single  buffer.  Ihe  SPN 
is  loviever  k-boinded,  that  is,  for  each  marking  the  nunber  of  tokens  in  any 
place  of  the  netvork  is  analler  than  k,  k  being  the  nunber  of  buffers  avail¬ 
able  in  the  conmon  manory.  The  k-bomdedness  of  the  SRJ  guarantees  that  the 
reachability  set  is  finite.  Since  the  size  of  the  state  space  of  the 
equivalQit  Nbrkov  chciin  is  smaller  than  or  equal  to  the  size  of  the  reachabil¬ 
ity  set,  it  too  is  finite. 

The  Petri  net  model  provides  a  formal  description  of  the  operation  of 
the  systans  fron  fig.  34  and  the  interpretation  of  places  and  transitions,  we 
obtain  all  the  information  necesssury  to  describe  the  \«ay  the  systan  operates. 

The  SRI  is  in  this  oase  much  more  conplex  than  in  fig .  29,  where  ve  did 
ODt  explicitly  model  the  synchronization  betveen  transmitting  and  receiving 
processor,  75  markings  are  reachable  in  the  single  buffer  case.  Nevertheless, 
from  fig.  34,  we  can  obtain  a  ^farkov  chain  that  models  the  bbavior  of  the 
system  including  those  features,  using  the  same  rules  as  before.  The  complex¬ 
ity  of  the  result  limits  the  applicability  of  these  highly  detailed  models  to 
very  small  systems. 
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9.  RESULTS 


Exact  and  approximate  anailytic  results  were  conpared  by  considering  a 
4x3x2,  a  4x4x3  and  a  6x4x2  system  respectively.  The  exact  chains  for  the 
first  two  systems  are  shovn  in  fig.  6  and  8,  respectively.  The  exact  chain 
for  the  third  system  (not  shown  here)  has  37  states. 

The  results  for  the  4x3x2  system  are  presented  in  fig.  35.  The  first 
colunn  gives  the  value  of  p  =  ^  ,  the  seccnd  colunn  shows  the  exact  value  of 
processing  power  as  a  function  of  p  ,  evaluated  using  the  exact  lunped  chain. 
The  otier  colonns  show  the  percentage  error  vhich  affects  the  processing  power 
value  ocmputed  with  each  of  the  four  approximations  introduced  in  this  report. 
Etor  this  case,  the  exact  chain  has  12  states,  approximations  A1  and  A2  have  8, 
approximation  B2  has  10  and  approximation  C2  has  5  states. 

The  results  ficr  the  4x4x3  systan  are  shown  in  fig.  36,  using  the  same 
format.  The  exact  chain  has  again  12  states,  whereas  the  approximate  chains 
have  10,  10,  11  and  5  states,  respectively. 

In  fig.  37  the  results  for  the  6x4x2  system  are  presented.  The  nunber 
of  states  are  in  this  case  37,  12,  12,  16  and  7. 

A  nunber  of  observations  can  be  made  based  cn  these  results.  Firstly 
approximations  Al,  A2  cind  B2  seem  to  yiela  upper  bounds  on  the  processing 
power,  whereas  C2  gives  a  Icwer  bound.  The  upper  bound  can  be  intuitively  ex¬ 
plained  for  approximation  Al,  since  the  randcm  redistributing  of  processors  to 
manories  tends  to  relieve  manory  congestion  and  therefore  improve  performance. 
The  bounds  seem  to  be  rather  tight,  since  percentage  errors  well  below  10% 
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p 

exact 

A1 

A2 

B2 

C2 

.lOOOe-02 

.3996eK)l 

.00 

.00 

.00 

.00 

.lOOOe-01 

.3960eK)l 

.00 

.00 

.00 

.00 

.1000eK)0 

.3604eK)l 

.02 

.04 

.00 

-.01 

.3000eKD0 

.2892e+01 

.30 

.53 

.04 

-.20 

.SOOOefOO 

.2338eK)l 

.84 

1.33 

.14 

-.50 

.lOOOeHDl 

.15066+01 

2.15 

2.95 

.50 

-1.12 

t 

.3000eK)l 
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Fig.  36  -  Results  for  the  4x4x3  system. 
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Fig.  37  -  Results  for  the  6x4x2  system. 
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were  tipically  observed  (except  for  approximation  A2  in  the  4x4x3  case). 
Tight  upper  and  Icnirer  bounds  are  extranely  useful,  as  they  allow  to  determine 
a  snail  range  in  vhich  the  exact  result  must  lie,  avoiding  the  conputational 
complexity  of  the  exact  prcblsn. 

Secondly,  we  observe  that  the  largest  systan  (6x4x2)  shows  the  snallest 
peroentage  errors.  This  may  be  due  to  the  feet  that  the  rate  averaging  ap>- 
proximation  gives  better  results  for  higher  nuiiber  of  states.  If  the  trend  of 
snaller  errors  with  larger  systans  were  verified  for  even  larger  models,  then 
we  could  conclude  that  our  aj^oximate  models  are  more  than  adequate  for  the 
study  of  large  multibus  systems. 

In  order  to  study  the  influence  of  the  simplifying  assunptions  intro¬ 
duced,  and  to  test  the  performance  of  the  approximate  techniques  on  larger 
systems,  a  simulation  program  was  written  in  GPSS.  Due  to  the  peculiarities 
of  the  langmge,  seme  discrepancies  cure  expected  between  the  simulated  systems 
and  the  models  for  vihich  we  performed  a  Nbrkov  chain  analysis.  Nevertheless  a 
comparison  between  the  analytic  and  the  simulation  results  shows  a  very  good 
agreement.  As  an  exanple,  in  fig.  38  results  are  shown  for  the  2x2x2  system. 

The  influence  of  the  simplifying  assunptions  was  studied  taMng  the 
6x4x2  system  as  a  benchmark.  Exact  and  approximate  analytic  results  for  this 
system  were  shown  in  fig.  37.  First,  the  impact  of  memory  access  time  distri¬ 
bution  on  system  performance  is  tested.  Fig.  39  shows  the  Vcilue  of  the  pro¬ 
cessing  power  -  obtained  via  simulation  -  for  a  6x4x2  system  with  fixed  memory 
access  time.  The  fixed  access  time  results  are  smaller  than  the  exponential 
access  time  results  in  fig.  37,  as  is  expected  from  known  results  in  queueing 
theory. 
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Fig.  38  -  CJanparison  of  ancilytic  and  simulation  results 
for  a  2x2x2  system. 
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Fig.  39  -  Processing  power  of  a  6x4x2  system  with 
fixed  access  times. 
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Next,  the  miform  memcsry  reference  assmption  is  relaxed  by  assvming 
that  access  requests  from  any  processor  are  directed  to  memory  1  with  prctoa- 
bility  c(,  and  cire  uniformly  distributed  among  all  other  memories.  That  is,  we 
set; 


il 


=  d 


all  i 


P.  . 


1  -  d 

m  -  1 


all  i,  j=2, . . .,m 


The  results  are  reported  in  Fig.  40.  Vfe  can  see  that  the  value  d  =  l/m  is  the 
one  that  maximizes  the  processing  power.  This  result  yas  expected,  since  high 
values  of  d  imply  that  one  manory  is  the  bottleneck  of  the  system,  whereas  low 
values  of  d  mean  that  the  accesses  are  mainly  directed  to  three  manories. 
Both  situations  increase  memory  contention  and  thus  decrease  systan 
throiqhput. 


The  increase  in  efficiency  gaiined  by  varyirg  the  the  nunber  of  busses 
yas  also  analyzed.  Fig.  41  shows  simulation  results  for  a  G-processor,  4- 
manory  systan  using  a  nunber  of  busses  varying  from  1  to  4.  The  increase  in 
processing  pover  is  negligible  for  low  values  of  p,  but  becomes  very  signifi¬ 
cant  for  heavily  loaded  systans.  In  the  latter  case  the  increase  in  perfor¬ 
mance  clearly  shovs  a  "diminishing  return"  behaviour. 

Finally,  a  IG-pcocessor,  8-manory,  3-bus  system  was  simulated,  in  order 
to  test  the  accuracy  of  the  approximate  models  for  large  system  size.  Results 
are  shown  in  fig.  42.  The  approximate  Fferkov  chains  of  models  A1  and  C2,  hav¬ 
ing  46  and  17  states  respectively,  vtsre  solved.  The  results  show  that  the  ap¬ 
proximate  models  behave  very  well  for  a  system  of  this  size;  indeed,  the  ap¬ 
proximate  results  are  so  close  to  the  simulation  results  that  they  fall  within 


-74- 


p 

.01 

.1 

.25 

.5 

.75 

.9 

.99 

001 

5.994 

5.994 

5.994 

5.994 

5.994 

5.994 

5.994 

01 

5.939 

5.940 

5.940 

5.939 

5.938 

5.937 

5.937 

1 

5.367 

5.379 

5.388 

5.368 

5.268 

5.219 

5.165 

333 

3.903 

3.966 

4.001 

3.818 

3.365 

3.119 

2.823 

5 

3.103 

3.151 

3.178 

3.014 

2.488 

2.186 

1.977 

75 

2.311 

2.313 

2.358 

2.172 

1.704 

1.461 

1.351 

• 

1.787 

1.829 

1.874 

1.699 

1.312 

1.093 

0.995 

0.633 

0.636 

0.641 

0.584 

0.441 

0.378 

0.339 

»• 

0.380 

0.384 

0.390 

0.354 

0.262 

0.224 

0.205 

Elg.  40  -  Processing  power  of  a  6x4x2  systan  with  non  vnifiorm 
manory  reference  (values  of  c(  in  the  top  row) . 


-75- 


p 


6x4x1  6x4x2  6x4x3  6x4x4 


.001 

5.99 

5.99 

5.99 

5.99 

.01 

5.94 

5.94 

5.94 

5.94 

.1 

5.16 

5.39 

5.39 

5.39 

.3 

2.9 

4.00 

4.11 

4.12 

.5 

1.96 

3.18 

3.35 

3.41 

1. 

1.02 

1.87 

2.16 

2.16 

3. 

0.33 

0.64 

0.81 

0.83 

5. 

0.21 

0.40 

0.50 

0.50 

10. 

0.10 

0.19 

0.25 

0.26 

Fig.  41  -  Processing  power  of  a  6-Erocessor,  4-meniary 
system  the  nunber  of  busses  is  varied . 


-76- 


p 

simulation 

A1 

C2 

.001  - 

15.98 

15.98 

15.98 

.01 

15.84 

15.84 

15.83 

.1 

14.24 

14.27 

13.89 

.333 

8.59 

8.73 

8.20 

.5 

6.01 

5.99 

5.79 

1. 

2.99 

2.99 

2.97 

3. 

1.01 

1.00 

0.99 

5. 

0.60 

0.60 

0.60 

10. 

0.30 

0.30 

0.30 

Fig.  42  -  Simulation  and  apprcocimate  analytic  results 
for  a  16x8x3  system. 


-77- 


the  simulatican  cxxifidfflice  interval.  MDrecver,  since  the  system  of  linear 
equations  associated  with  the  aRaroxiinate  Nbrkov  chcdn  can  be  easily  solved 
with  nunerical  methods,  the  approximate  models  require  much  less  conputer  time 
than  a  simulation  program. 
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APPENDIX  1 


In  this  appendix  we  give  expressions  for  the  transition  rates  of  the  ap¬ 
proximate  N&rkov  chain  of  model  A1  in  the  general  case  of  a  pxnui)  system, 

Cbnsider  that,  given  that  ve  are  in  state  (i,j),  transitions  can  occur 
to  at  most  four  neighboring  states; 

(i+l,j)  (i-l,j)  (i,j+l)  (i,j-l)  (Al.l) 

and  we  daiote  such  transitions,  respectively,  with  the  notation 


i — >i+l  i — >i-l  j — >j+l  j — >j-l  (A1.2) 

Using  the  simplifications  introduced  we  associate  to  these  transitions  the 
following  rates; 


R(i  ->  i+1)  =  (p-i-j)  ,\ 


q<i<b 
p-i- j>0 


(A1.3) 


R(i  ->  i-1)  = 


i  h 


b  p 


Ib-ll  j 

I  m  I 


i<b 


i=b 


(A1.4) 


R(j  ->  i^-1)  = 


(E^i- j)  i  ,\ 

(p-b-j)  ,\ 


i<b,  p-i-j>0 
i=b,  p-b-j>0 


(A1.5) 
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APPENDIX  2 


In  this  appendix  ws  give  expressicais  for  the  nurtber  of  states  at  level  1 
of  the  exact  limped  chain  in  the  case  of  a  pmnx.'?  system. 

Vfe  want  to  count  the  nimber  of  states  tliat  show  seme  properties  in  order 
to  evaluate  the  transition  rates  of  the  approximate  Nbrtov  models  using  the 
averaging  technique  introduced  in  section  7. 

The  level  of  a  state  is  defined  as  the  difference  between  the  total 
nimber  of  processors  and  the  nimber  of  active  processors. 

At  levels  0  and  1  there  is  only  one  state,  at  level  2  there  are  two 
states,  one  with  one  processor  accessing  canmai  memory  and  one  with  two. 

fbr  1>3  we  know  that  we  have  cne  state  wdth  n  =1  (see  eq.  15),  but  we  do 
not  know  how  many  states  exist  with  n^=2.  Ihe  nimber  of  such  states  can  be 
evaluated  by  applying  acme  results  in  cembinatorial  analysis. 

Define  the  nutibers  P)^(n)  by  the  recurrent  relation 

p^(n)  =  p^(n-k)  +  P|^_j^(n-k)  +  ...  +  Pj^(n-k)  +  p^Cn-k)  (A2.1) 

whth 

P|^(n)  =  0  n<k  ,  k<0 

PQ(n)  n  n>0  (A2.2) 

P|^(k)  =  i  k>0 

Note  that  P|^(n)  is  the  number  of  unordered  partitions  of  n  into  k  parts,  with 

k  and  n  integers. 
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Now  we  can  state  that  the  nutiber  of  states  at  level  1^+2,  1^3,  such 

that  in  a  pxmx2  systan  is: 


n(m,2,k)  =  1  jp-(jt2)  |  »  k<p-2 

^0  '  ' 


(A2.3) 


Out  of  this  nutiber,  some  states  will  be  such  that  no  processor  is  queueing  for 
a  bus  to  reach  a  free  memory.  The  nunber  of  these  states  is: 


n^(m,2,k)  =  p2(k+2)  ,  k^2 


(A2.4) 


On  the  contrary  the  nunber  of  states  such  that  some  processor  is  queueing  fbr 
a  bus  is: 


n^(m,2,k)  =  1  P2(3t2)  .  k£f>-2 


(A2.5) 


Finally,  the  nutiber  of  states  at  level  l*k+2,  1^3  with  some  processor  queueing 
for  a  bus  (if  more  than  one  then  all  processors  queueing  fbr  the  same  memory 
module)  and  at  least  one  queue  fbr  the  busy  manories  empty,  is: 


n,  (m,2,k)  =  1  Pi(jtl)  =  k 
j=0 


k£p-2 


(A2.6) 


In  the  particular  case  of  three  memories  (np3)  the  above  results  can  be 
put  in  polynanial  form: 


n(3,2,k)  = 


+  k  +  I  K  odd 


-j-  +  k  +  1  k  even 
4 


(A2.7) 


k+1 

2 


K  odd 


nQ(3,2,k) 


k  evai 


n^(3,2,k) 


k^  .  k  .  1  ^ 

-^  +  ,  +  5  k  odd 

k2  k 

+  »  k  even 

4  2 


I 

i 


! 


I 

t 

i 
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(A2.8) 


(A2.9) 
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Markovian  models  are  developed  for  the  performance  analysis  of  multiprocessor  sys¬ 
tems  intercommunicating  via  a  set  of  busses.  The  performance  index  is  the  average 
number  of  active  processors,  called  processing  power.  From  processing  power  a 
variety  of  other  performance  measures  can  be  derived  as  dictated  by  the  specific 
processor  application.  Exact  models  are  first  introduced,  and  are  illustrated 
with  a  simple  example.  The  computational  complexity  of  the  exact  models  is  shown 
to  increase  very  rapidly  with  system  size,  thus  making  the  exact  analysis  imprac¬ 
tical  even  for  medium  size  systems.  To  overcome  the  complexity  of  computation,  I 
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several  approximate  models  are  introduced.  The  approximate  results  are  compare< 
with  the  exact  ones  and  found  to  be  surprisingly  accurate  for  a  wide  range  of 
configurations.  Simulation  is  used  to  validate  the  analytic  models  and  to  test 
their  robustness. 
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